{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 聚类\n",
    "\n",
    "熟悉各中聚类算法的调用\n",
    "并用评价指标选择合适的超参数\n",
    "活动描述信息在events.csv文件：共110维特征\n",
    "前9列：event_id, user_id, start_time, city, state, zip, country, lat, and lng.\n",
    "event_id：活动的id, \n",
    "user_id：创建活动的用户的id .  \n",
    "city, state, zip, and country： 活动地点 (如果知道的话).\n",
    "lat and lng： floats（活动地点的经度和纬度）\n",
    "start_time： 字符串，ISO-8601 UTC time，表示活动开始时间\n",
    "\n",
    "后101列为词频：count_1, count_2, ..., count_100，count_other\n",
    "count_N：活动描述出现第N个词的次数\n",
    "count_other：除了最常用的100个词之外的其余词出现的次数\n",
    "\n",
    "作业要求：\n",
    "根据活动的关键词（count_1, count_2, ..., count_100，count_other属性）做聚类，可采用KMeans聚类\n",
    "尝试K=10，20，30，..., 100, 并计算各自CH_scores。\n",
    "\n",
    "提示：由于样本数目较多，建议使用MiniBatchKMeans。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导入必要的工具包\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.cluster import MiniBatchKMeans\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn import metrics\n",
    "import scipy.io as sio\n",
    "#from sklearn.decomposition import PCA\n",
    "import time\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#读取训练数据\n",
    "eventContMatrix = sio.mmread(\"EV_eventContMatrix\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#将像素值[0,255]  --> [0,1]\n",
    "X_train = X_train / 255.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "# 原始输入的特征维数和样本数目\n",
    "print('the shape of train_image: {}'.format(X_train.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#对数据进行PCA降维\n",
    "pca = PCA(n_components=0.75)\n",
    "pca.fit(X_train)\n",
    "\n",
    "X_train_pca = pca.transform(X_train)\n",
    "\n",
    "# 降维后的特征维数\n",
    "print(X_train_pca.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 将训练集合拆分成训练集和校验集，在校验集上找到最佳的模型超参数（PCA的维数）\n",
    "X_train_part, X_val, y_train_part, y_val = train_test_split(X_train,y_train, train_size = 0.8,random_state = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "#拆分后的训练集和校验集的样本数目\n",
    "print(X_train_part.shape)\n",
    "print(X_val.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 一个参数点（聚类数据为K）的模型，在校验集上评价聚类算法性能\n",
    "def K_cluster_analysis(K, Matrix):\n",
    "    start = time.time()\n",
    "    \n",
    "    print(\"K-means begin with clusters: {}\".format(K));\n",
    "    \n",
    "    #K-means,在训练集上训练\n",
    "    mb_kmeans = MiniBatchKMeans(n_clusters = K)\n",
    "    mb_kmeans.fit(Matrix)\n",
    "    \n",
    "    # 在训练集和测试集上测试\n",
    "    #y_train_pred = mb_kmeans.fit_predict(X_train)\n",
    "    result = mb_kmeans.predict(Matrix)\n",
    "    \n",
    "    #以前两维特征打印训练数据的分类结果\n",
    "    #plt.scatter(X_train[:, 0], X_train[:, 1], c=y_pred)\n",
    "    #plt.show()\n",
    "\n",
    "    # K值的评估标准\n",
    "    #常见的方法有轮廓系数Silhouette Coefficient和Calinski-Harabasz Index\n",
    "    #这两个分数值越大则聚类效果越好\n",
    "    #CH_score = metrics.calinski_harabaz_score(X_train,mb_kmeans.predict(X_train))\n",
    "    CH_score = metrics.silhouette_score(Matrix,result)\n",
    "    \n",
    "    #也可以在校验集上评估K\n",
    "    ###v_score = metrics.v_measure_score(y_val, y_val_pred)\n",
    "    \n",
    "    end = time.time()\n",
    "    print(\"CH_score: {}, time elaps:{}\".format(CH_score, int(end-start)))\n",
    "    ###print(\"v_score: {}\".format(v_score))\n",
    "    \n",
    "    return CH_score###,v_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.15334441883250638, time elaps:40\n",
      "K-means begin with clusters: 20\n",
      "CH_score: 0.13384652390546367, time elaps:18\n",
      "K-means begin with clusters: 30\n",
      "CH_score: 0.15464525128955411, time elaps:15\n",
      "K-means begin with clusters: 40\n",
      "CH_score: 0.15342015056744307, time elaps:15\n",
      "K-means begin with clusters: 50\n",
      "CH_score: 0.15508252253741733, time elaps:15\n",
      "K-means begin with clusters: 60\n",
      "CH_score: 0.10738250944103446, time elaps:15\n",
      "K-means begin with clusters: 70\n",
      "CH_score: 0.1371370511392862, time elaps:15\n",
      "K-means begin with clusters: 80\n",
      "CH_score: 0.14273793844828214, time elaps:16\n",
      "K-means begin with clusters: 90\n",
      "CH_score: 0.12442431763566943, time elaps:16\n",
      "K-means begin with clusters: 100\n",
      "CH_score: 0.12573799481633943, time elaps:16\n"
     ]
    }
   ],
   "source": [
    "# 设置超参数（聚类数目K）搜索范围\n",
    "Ks = [10,20,30,40,50,60,70,80,90,100]\n",
    "CH_scores = []\n",
    "###v_scores = []\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K, eventContMatrix)\n",
    "    ###ch,v = K_cluster_analysis(K, X_train_part, y_train_part, X_val, y_val)\n",
    "    CH_scores.append(ch)\n",
    "    ###v_scores.append(v)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x7f58b050a7f0>]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3XmUVPWd9/H3l102FWmJIqsiS5c7Eg2KC5LRRDEm4hYXXIIZIepg8oyTSfKcZJYkZ5KJmWiM0GqMCy5JjIrGLkBREgPDoiyCbKKAGsEFF5Cl4ff88a16erGxq7tu1a2u+3md06e6q6rrfrsoPnXrt1oIARERSYY2cRcgIiLFo9AXEUkQhb6ISIIo9EVEEkShLyKSIAp9EZEEUeiLiCSIQl9EJEEU+iIiCdIu7gIa6tmzZ+jfv3/cZYiItCoLFy58J4RQ0dT9Si70+/fvz4IFC+IuQ0SkVTGz13O5n5p3REQSRKEvIpIgCn0RkQRR6IuIJIhCX0QkQRT6IiIJotAXEUmQkhunLxKlrVvhpZdg0SIYORKOPTbuikTipdCXsrF9OyxeDAsW1H4tXw579vjtp50GzzwTb40icVPol6nFi+GnP4V+/eDww2HwYL/s2TPuyqKxcycsW1Y/4JcuhZoav72iAo4/Hr76VRg+HO69F2bPjrVkkZKg0C9TP/whTJ8OIdQGIUCPHvXfBLKXhx0G++wTX72fpaYGVqyoH/CLF8OOHX77/vt7sH/nO345fDj06QNmtY+xdi088ghs2gQHHhjP3yFSChT6Zejvf4cnnoAbb4Qf/xjWrYNVq2DlSr9ctQpmzIB77qn9HTPo27fxN4S+faFNkbr89+zx+hYsgPnz/fLFF+GTT/z2bt3guOPg+utrA37AgPoB35hUyi+XLYPTTy/s3yBSyhT6Zeiee/zs+OqroV07GDTIv7785fr3++gjWL36028I99zjt2V17Oi/n30TqPuGcMABLa8zBHj11dpwX7DAO1yzx+7c2Tter722NuAHDWrZG5BCX8Qp9MtMCFBVBSefDEOGfPZ9u3XzUG04oiUEePvt2jeC7OXSpfDYY/Wbiw44oP6bQPb7ww6DTp3qP+b69fWbaBYsgC1b/PaOHeHoo+Hyy2sDfsgQf9OKQq9eXuvLL0fzeCKtVdmE/u7d8NRTHhYHHRR3NfF57jlYswa+//2WP4YZfO5z/nXKKfVv27ULXnvt028I1dXw29/Wf4xsJ3KbNrBwIWze7Le1awdHHgkXXOD/XscfD5WV0L59y2vO5W+qrPQzfZEkK5vQX78ezj3Xw+6HP4y7mvhUVcG++8L55xfm8du3r20uaijbXNTwDWHXLjj7bA/34cPhiCPqfwoollQK7rvPP3U01QcgUq7KJvQHDICzzoIpU+Bf/xU6dIi7ouJ77z34/e+9Lb9z5+Iff2/NRaUilYIPP4SNG310j0gSldUyDBMn+siVRx+Nu5J43H+/D2P8xjfirqQ01e3MFUmqsgr9M8+EgQPhttvirqT4QoCpU30449FHx11Naaqs9Et15kqSlVXot2kD110Hc+bAkiVxV1Nc8+f76Jprrom7ktLVo4d38utMX5KsrEIf4MorvZMwaWf7VVXejn/JJXFXUtpSKYW+JFvZhX6PHh58991XOwa83H38MUyb5kMgu3ePu5rSlkr5Imy7d8ddiUg8yi70wTt0t22rv8xAOXvoIQ9+deA2LZXyJR3WrYu7EpF4lGXoH3ssnHAC/PrXtcvqlrOpU2HoUDjxxLgrKX3ZETzqzJWkKsvQBz/bX7UKZs2Ku5LCWroU5s3zs3xNOGrasGF+qXZ9SaqyDf1x43xN9XLv0K2q8olol10WdyWtQ9eu0L+/Ql+SK6fQN7MzzWylma0xs5sbuX2UmS0ysxozO7/BbbvN7KXM1+NRFd6Ujh397PeJJ+D114t11OLavt03BznvvPLZHKUYNIJHkqzJ0DeztsBtwFnAMOBiMxvW4G7rgfHAA408xCchhKMzX2PzrLdZrr3WL3/zm2IetXgefRTef19j85srlfJ1gXbtirsSkeLL5Ux/BLAmhPBqCGEn8CBwbt07hBBeCyEsAUqq27RvXxg71ptAtm+Pu5roTZ3qaw5pffjmSaU88FevjrsSkeLLJfR7Axvq/Lwxc12uOpnZAjOba2ZfaewOZjYhc58Fm7Pr70Zk4kR45x3fKq+crFkDzz7ri6sVa1ercpFdjkFNPJJEucRFY2NCQjOO0TeEMBy4BLjFzA791IOFMCWEMDyEMLyioqIZD9200aN9U49y69C9804P+/Hj466k9RkyxJ87hb4kUS6hvxGouxDtIcCbuR4ghPBm5vJVYDZwTDPqy5uZr8czb55v5FEOdu3yDUu+/GXo3ZzPXAL4Mh2DBin0JZlyCf35wCAzG2BmHYCLgJxG4ZjZ/mbWMfN9T2AksLylxbbUFVdAly7lc7b/5JO+hLQ6cFtOI3gkqZoM/RBCDTAJqAZWAA+HEF42sx+Z2VgAMzvezDYC44A7zCw733EosMDMFgPPAj8JIRQ99Pfd18exT5sG775b7KNHr6rKV4v80pfirqT1SqVg7VpfkkEkSXLqAgwhPBVCODyEcGgI4T8y1/0ghPB45vv5IYRDQghdQggHhBAqM9e/EEI4IoRwVObyzsL9KZ9t4kQfwXPXXXFVEI2NG+HPf/bVRKPaNDyJKit9iY5XXom7EpHiSsy4j1QKRo2C229v3Sss3n23h9XVV8ddSeumXbQkqRIT+uBn++vWwdNPx11Jy+zZ46N2Ro/2HcKk5Q47zJevUOhL0iQq9M87z9vCW2uH7syZvqSEllDOX/v2PnRToS9Jk6jQb98eJkzwM/21a+OupvmqquCAA+ArjU5xk+ZKpbTEsiRPokIfPPTbtvW2/dZk82b405/g8st9MTnJX2Wlf3L68MO4KxEpnsSF/sEHw1e/6m3j27bFXU3ufvc7n5SlDtzoZDtzlxd9ELFIfBIX+uAdulu2+Lj91iAEX1ztxBNr142R/GkEjyRRIkP/5JP9P/xtt3mglrq//tWXAlYHbrT694fOnRX6kiyJDH0zP9t/8UWYOzfuappWVQXdusEFF8RdSXlp08Y/OakzV5IkkaEPcOml0L176Q/f3LIFHn4YLrnE1w+SaFVW6kxfkiWxod+1qy/E9sgj8PbbcVezd9Om+fowWlytMFIpX7zunXfirkSkOBIb+uBLLu/c6c0npWrqVDj6aDjuuLgrKU/Zzlw18UhSJDr0hwzxJQ1+8xuoqYm7mk9btMj7Ha65xvshJHoawSNJk+jQB5g0yVeufOKJuCv5tKlTfcOPr3897krK18EHw3776UxfkiPxoX/22dCnT+l16G7dCg88AOPGeShJYZipM1eSJfGh364dfPObMGtWaa2t/vvf+/IAGptfeNldtFrDnA2RfCU+9MHbzDt0gF//Ou5Kak2dCocfDiedFHcl5S+Vgvffh7feirsSkcJT6AMHHujNKPfcAx99FHc1sGKFz8JVB25xqDNXkkShnzFxojen3Hdf3JX4ENJ27XwegRRedj0jdeZKEij0M044AY45Jv71eHbs8BU1zz3XP4FI4VVU+HOtM31JAoV+hpkP33z5ZXj++fjqeOwxnx2qDtziynbmipQ7hX4dF10E++8f7/DNqiro2xfOOCO+GpIou4vWnj1xVyJSWAr9Ojp3hquugkcfhTffLP7x162DGTN8o5S2bYt//CRLpXxuxOuvx12JSGEp9Bv4x3+E3bthypTiH/uuu3y53yuvLP6xk05r8BTf1q3wwgtxV5E8Cv0GDj0UzjwT7rjDF2MrlpoauPtuP3afPsU7rrhhw/xS7frF8cILvpDgyJE+MVKKR6HfiIkTfbndRx8t3jGffhreeENLKMdl3339zVahX1g7dsB3v+u719XUQM+e8N//HXdVyaLQb8SZZ8KAAcXt0K2qgl69fC0giYdG8BTWkiUwYgT8+Mfed7ZkCXzrW/DUUz4hUYpDod+Itm19rf05c2Dp0sIf7623YPp0GD8e2rcv/PGkcamUr79Uistst2a7d8NPfwrDh/uGRU884cuMdOvmfWidOsEtt8RdZXIo9Pfiqqv8xViMs/3f/tb/Y1x9deGPJXtXWenND2vXxl1J+VizBkaNgptv9gmHy5bV/zRbUQGXX+4TEjdvjq/OJFHo70WPHnDxxb4swwcfFO44e/Z4086pp8KgQYU7jjRNa/BEJwTfnOioo2D5crj/ft/ruWfPT9/3xhth+3a/vxSeQv8zTJzow8ruuadwx5g9G159VR24pWDoUJ+ZrdDPzxtvwFlnedPNyJHeRHrJJXtfPHDoUPjSl+DWWz38pbAU+p/huOPg85/3Jp5CzdScOtVnAX/ta4V5fMld584+ZFeh3zIhwLRp/olpzhxfqry6Gg45pOnfnTwZNm3y35fCyin0zexMM1tpZmvM7OZGbh9lZovMrMbMzm/k9u5m9oaZ3RpF0cU0cSKsWlWYscTvvgt//CNceqn3H0j8sssxSPO8+64vY3LJJX7m/tJLfqaf69Lgp58ORx7pwze1mU1hNRn6ZtYWuA04CxgGXGxmwxrcbT0wHnhgLw/zb8BzLS8zPuPGeTtkITp077vPJ4Cpaad0VFb6m/yOHXFX0no8+aS/WT76qA/HnDOn+f1TZn62v2wZzJxZmDrF5XKmPwJYE0J4NYSwE3gQOLfuHUIIr4UQlgCfagQxs+OAXkA6gnqLrlMnX/HyiSdg/froHjcEb9oZMcLPcKQ0pFI+kmrlyrgrKX0ffQQTJvhonIoKmD/fR+m0dN2oiy6Cz30Ofv7zaOuU+nIJ/d7Ahjo/b8xc1yQzawP8HPhO80srHd/8pl9GObpg3jxvRtASyqVFI3hy8/zzPjLnzjvhn//ZA/+oo/J7zI4dfXnz6mo9/4WUS+g31iqXa6vbdcBTIYQNn3UnM5tgZgvMbMHmEhys27cvnHOOD62M6mP/1KnQpQtceGE0jyfROPxw37VModO47dvhO9/xIcZt2nj4/+QnHthRuPZa2GcfTdYqpFxCfyNQdwmwQ4BcFx4+EZhkZq8BPwMuN7OfNLxTCGFKCGF4CGF4RUVFjg9dXBMn+uSRRx7J/7E+/BAefNA/znbrlv/jSXQ6dIDBg9WZ25hFi3xW7c9+5uH80ks+JDNKPXv6NqH33eezdyV6uYT+fGCQmQ0wsw7ARcDjuTx4COHrIYS+IYT+wLeB34UQPjX6pzUYPdrPAm+NYPzRgw/Ctm1q2ilVlZU606+rpgb+/d99+PL778Of/wy33w5duxbmeDfe6J+ob7+9MI+fdE2GfgihBpgEVAMrgIdDCC+b2Y/MbCyAmR1vZhuBccAdZlZ250lt2vh6PPPmwcKF+T1WVZW3HY8YEU1tEq1UyifMbd0adyXxW7nSz+a//30fybZ0qS9IWEiDB3vn8K9/DZ98UthjJVFO4/RDCE+FEA4PIRwaQviPzHU/CCE8nvl+fgjhkBBClxDCASGEykYe47chhEnRll9cV1zhE3jyGb65eLF3en3jG7mPYZbiynbmLl8ebx1x2rMHfvUrOOYYXz/noYfggQd8eZJimDzZm1Pvv784x0sSzchthv32g8su81mD777bsseoqvJOr0svjbY2iU7SR/Bs2ABf/CJcfz2cdpo/DxdcUNwaTj3VN1nRZK3oKfSbaeJEH8Fw993N/91PPvEOqq99rXhnTNJ8Awf6/IykdeaG4KtdplLejDllii/5fdBBxa8lO1lrxQofwinRUeg30xFH+K4/t9/e/PV4/vAH2LJFM3BLXdu2vpRAks70N2/2k5ErrvDx9osXx98EeeGFcPDB2lkragr9Fpg40Tv6nn66eb9XVQWHHeYfXaW0JWkXrcce8xFLTz4J//Vf8Oyz/mknbh06+M5aM2YUZzOjpFDot8B55/l08eYM31y1Cp57zjdKUQdu6UulfIng99+Pu5LC+eADuPJK+MpXfCXMhQvh299u+TIKhTBhgg+e+MUv4q6kfCj0W6BDB38xPv107rss3Xmn/2caP76gpUlEsp255dqu/+yzvubTvffC974Hc+fW/s2lpEcPf2O6/374+9/jrqY8KPRbaMIEH7ufywSSnTt9S8RzzvFPCFL6yjn0V62CM87wzuq//hX+7d/8RKZU3XAD7Nrl4/Ylfwr9Furd25t57rrLZ9d+lunTfYMIdeC2Hn36+BIZ5diuP326D0KYOdNn2Za6QYNg7FgP/ab+r0nTFPp5mDTJ23wffPCz71dV5W8ShZ7JKNExK9/lGNJpGDbM39hai8mTfW7MvffGXUnrp9DPw6hRHgy33rr3CSTr13vb/1VXlVYHmTQtlfJRI+U0OWj7dh9Q8MUvxl1J85x8sm9f+otfFG7r0qRQ6OfBzIdvvviid4Q1JjuJ66qrileXRCOV8rPLTZviriQ6f/mLB39rC/3sZK2VK33BN2k5hX6eLr3U234bW49n924ftTNmDPTvX/TSJE/l2JlbXe2dtqNGxV1J840b582kmqyVH4V+nrp181mMjzzy6TPCGTN8HRMtodw6VWaWDSyndv10Gk46yTfwaW3at/f1gJ55xtfyl5ZR6Efguut8WGZVVf3rp071vUPHjo2nLslPr15wwAHlE/pvvQVLlsA//EPclbTcN77hb1iarNVyCv0IDB0Kp5/ue+jW1Ph1b78Njz/unwJKeQy07J1ZeS3HMHOmX7a29vy69t/f+8emTYM3c92/T+pR6Edk0iRvypk+3X++5x5/A7j66njrkvykUt6mXw4jeKqr4cADfSZua3bDDf5/K599LZJMoR+Rc87xcc+33eYBUVXlw8yGDIm7MslHKuV7Gm/cGHcl+dmzx/uYxozxmeSt2aGH+npBv/mNdjdriVb+z1862rXzzaJnzvS2/NWrNQO3HJRLZ+6SJT7QoDU37dR1003w3nu+/r80j0I/Qtdc4yMMJk2CffeF88+PuyLJV7mEfjrtl2PGxFtHVL7wBd9jWpO1mk+hH6FevXws8a5d8PWv+5Kw0rr16OEbeZRD6B95ZDy7YBVCdrLW6tW+D4DkTqEfsZtu8pC47rq4K5GotPYRPFu3wpw55dO0k/W1r0Hfvpqs1VwK/Ygde6xvvpFtFpDWL5XyvVp37467kpZ5/nmfR1Juod+unU/Wmj0bFi2Ku5rWQ6Ev0oTKSt/Uft26uCtpmXTa184/6aS4K4neNddA1646228Ohb5IE7Jr8LTWJp50Gk45BfbZJ+5Korfvvh78Dz3U+ofVFotCX6QJw4b5ZWsM/Y0bYfny8mvaqev6630ET3P2rE4yhb5IE7p2hQEDWmfoZ4dqlnPoDxgAX/0q3HEHfPxx3NWUPoW+SA6yyzG0Num0jyYr94EFkyfDli2+F7V8NoW+SA4qK+GVV3wUTGuxe7cvvfDFL/q49nJ24olwwglwyy2td5RVsSj0RXKQSvkiX6tXx11J7l580ZcqKOemnbomT4a1a+GJJ+KupLQp9EVy0BpH8FRX++UZZ8RbR7Gcdx7066fhm01R6IvkYPBg39i+NYV+Ou2TBSsq4q6kONq182WX58yB+fPjrqZ0KfRFctCpEwwa1Ho6cz/6CF54ITlNO1lXXw3du2tnrc+i0BfJUWVl6znTnz3b+yBa89aILdG9u2+p+PDDvqmRfFpOoW9mZ5rZSjNbY2Y3N3L7KDNbZGY1ZnZ+nev7mdlCM3vJzF42s29GWbxIMaVSsGaNL8lQ6tJp30v2xBPjrqT4vvUtv/zVr+Kto1Q1Gfpm1ha4DTgLGAZcbGbDGtxtPTAeeKDB9W8BXwghHA18HrjZzA7Ot2iROKRSvivaihVxV9K06mo49VTo2DHuSoqvXz/fy2LKFG/mkvpyOdMfAawJIbwaQtgJPAicW/cOIYTXQghLgD0Nrt8ZQtiR+bFjjscTKUmtZQTPunU+tDRp7fl1TZ4MH3wAd98ddyWlJ5cQ7g3UbR3bmLkuJ2bWx8yWZB7jpyGET+1hb2YTzGyBmS3YvHlzrg8tUlSHHQYdOpR+Z+6MGX6Z5NAfMQJGjtRkrcbkEvqNzeULuR4ghLAhhHAkcBhwhZn1auQ+U0IIw0MIwyuSMr5MWp127Xyj+1I/00+nfXORwYPjriRekyf7p54//SnuSkpLLqG/EehT5+dDgE+drTclc4b/MnByc39XpFSU+i5aNTUwa1Yyll5oyrnn+mJsmqxVXy6hPx8YZGYDzKwDcBHweC4PbmaHmNk+me/3B0YCK1tarEjcUilYvx4+/DDuSho3f74vPJbkpp2stm3hxht9vsLcuXFXUzqaDP0QQg0wCagGVgAPhxBeNrMfmdlYADM73sw2AuOAO8ws2+o5FJhnZouB54CfhRCWFuIPESmGbGduqbbrp9PQpg2MHh13JaXhyit9oxVN1qrVLpc7hRCeAp5qcN0P6nw/H2/2afh7M4Aj86xRpGTUDf1SHAOfTsPxx0OPHnFXUhq6dYMJE+DnP4fXXoP+/eOuKH4aQinSDP36QefOpdmuv2ULzJunpp2GvvUt79/QZC2n0BdphjZtSnc5hmee8eGJCv36+vSBCy6AqVNLty+mmBT6Is1UqiN40mlvzvj85+OupPRMnuyzc++8M+5K4qfQF2mmVArefhveeSfuSmqF4EsvnH46tG8fdzWlZ/hwOPlk+OUvfVhrkin0RZopu99sKY3gWbvWOyqTtqpmc0yeDK+/Do8+Gncl8VLoizRTKa7Bk077pdrz9+6cc3wpjaRP1lLoizTTwQfDfvuVVuhXV8PAgXDooXFXUrqyk7XmzoW//S3uauKj0BdpJrPS6szdtctH7ugsv2njx8P++yf7bF+hL9IC2dAPOS89WDhz58LHHyv0c9GlC1x7Lfzxj74YWxIp9EVaoLLSJ0O99VbclXh7ftu2PnJHmjZpks+3+OUv464kHgp9kRYopc7cdBpOOMHXmJGm9e4NF13kY/a3bIm7mvqK8clRoS/SAtlhm3GH/rvv+sqaatppnn/6J28Sq6qK5/iffAIvvQTTpsEPfgDjxvmJxNlnF/7YOS24JiL1VVRAr17xh/6sWX52qNBvnmOP9T2E/+d/4IYbCjeh7YMPfE/lFStg+fLa79etqz2rb9PGR14NHQpf+EJh6qhLoS/SQqUwgied9uGjxx8fbx2t0eTJMHYs/OEP3tzTUiH4DO1soNcN+Lp9Ph06wOGH++zgyy7zkB861K/r1Cn/vydXCn2RFqqs9HbhPXv8bK3YQvDQP+MM78iV5vnyl2HQIF92+cILm95pbM8e30Cn7hl7NuDr9g107ephPmYMDBtWG+4DBviWm3ErgRJEWqdUCrZu9an9AwYU//ivvAIbNsD3v1/8Y5eDNm28bf+66+Cvf4WTTvLrd+2CNWs+3SSzciVs21b7+xUVHuYXXuiX2YDv3bu0t6pU6Iu0UN0RPHGEfnbphTFjin/scnH55fC97/ma+wMHesivWVN/Uba+fT3MTzml9qx96FDo2TO+uvOh0BdpobojeM45p/jHT6e9PVi7QbVcly5w000+gmbbNj9bP++82mAfMsSba8qJQl+khbp397PAODpzd+yA2bPh6quLf+xy893vws03x9MvE4eE/JkihVFZGc8Syy+84GemGqoZjaQEPij0RfKSSnknX7E35qiu9rHlp55a3ONK66fQF8lDKgU7d3rnXzGl0z6Rp9zam6XwFPoieYhjDZ5Nm+DFF9W0Iy2j0BfJw9ChPia7mKE/c6ZfamtEaQmFvkge9tnHd6sqZmduOg0HHADHHFO8Y0r5UOiL5KmYa/Bkl14YMyZZI04kOnrZiOQplYLVq2H79sIfa9kyX8RL7fnSUgp9kTylUrB7t6/NUmjZpRcU+tJSCn2RPBVzBE867RPCevcu/LGkPCn0RfI0aJAvmVvoztxPPoHnn9dZvuRHoS+Spw4dYPDgwp/pz5nj/QYKfcmHQl8kAsUYwZNOQ8eOMGpUYY8j5S2n0DezM81spZmtMbObG7l9lJktMrMaMzu/zvVHm9nfzOxlM1tiZhdGWbxIqUilfN/Tjz8u3DHSaTj5ZOjcuXDHkPLXZOibWVvgNuAsYBhwsZkNa3C39cB44IEG128DLg8hVAJnAreY2X75Fi1SarKducuXF+bx33oLli5V047kL5cz/RHAmhDCqyGEncCDwLl17xBCeC2EsATY0+D6VSGE1Znv3wQ2ARWRVC5SQrIbqhSqM1dDNSUquYR+b2BDnZ83Zq5rFjMbAXQA1jb3d0VK3cCB0KlT4dr102no1QuOOKIwjy/JkUvoN7bFb2jOQczsIOBe4MoQwp5Gbp9gZgvMbMHmzZub89AiJaFtW99qrxChv2cPzJihpRckGrm8hDYCfer8fAjwZq4HMLPuwJPA90IIcxu7TwhhSghheAhheEWFWn+kdSrUCJ7Fi2HzZq2qKdHIJfTnA4PMbICZdQAuAh7P5cEz938U+F0I4ZGWlylS+lIpePNNeP/9aB83255/xhnRPq4kU5OhH0KoASYB1cAK4OEQwstm9iMzGwtgZseb2UZgHHCHmWW7sy4ARgHjzeylzNfRBflLRGJWqM7c6mo46ij43OeifVxJpna53CmE8BTwVIPrflDn+/l4s0/D37sPuC/PGkVahbpr8Jx0UjSPuXUr/OUvcOON0TyeiLqFRCLSpw906xZtu/5zz8GuXWrPl+go9EUiYhZ9Z2467btzjRwZ3WNKsin0RSKUDf3QrEHNe1ddDaec4nMARKKg0BeJUGUlvPsubNqU/2OtXw+vvKJZuBIthb5IhKLcUGXGDL9U6EuUFPoiEYoy9NNp3yFrWMPlDUXyoNAXidCBB0LPnvmH/u7dMHOmn+VbYwuhiLSQQl8kQlGN4Fm4EN57T007Ej2FvkjEKit9Vm4+I3jSaX8D0dILEjWFvkjEUin46CPYsKHp++5NOg3HHutNRSJRUuiLRCzfztwPP4S//U2zcKUwFPoiEcsuvNbS0J89G2pq1J4vhaHQF4nY/vv7UMuWhn51NXTpAieeGG1dIqDQFymIbGduS6TTcNpp0KFDtDWJgEJfpCBSKVi+3MfbN8err8KaNWrPl8JR6IsUQCoF27d69U/RAAAHe0lEQVR7iDeHll6QQlPoixRAS0fwVFdDv34waFD0NYmAQl+kILLr5TQn9GtqYNYsLb0ghaXQFymALl1gwIDmdeb+7//6GH017UghKfRFCqS5a/Ck09CmDYweXbiaRBT6IgWSSsHKlbBzZ273T6dhxAgf5y9SKAp9kQJJpbydftWqpu/7/vswb56adqTwFPoiBdKc5RieeQb27FHoS+Ep9EUKZPBgaNs2t87cdBq6d/fmHZFCUuiLFEinTj7evqkz/RB8fP7o0dC+fXFqk+RS6IsUUC4jeNasgddfV9OOFIdCX6SAUilYuxa2bdv7faqr/VKhL8Wg0BcpoMpKb75ZsWLv90mn4dBDYeDA4tUlyaXQFymg7Bo8e+vM3bkTnn1WZ/lSPAp9kQI67DBfF39v7fpz58LHH2spZSkehb5IAbVrB0OH7j30q6t9WOdppxW3Lkkuhb5IgX3WCJ502rdF7N69uDVJcuUU+mZ2ppmtNLM1ZnZzI7ePMrNFZlZjZuc3uO1pM9tiZtOjKlqkNamshA0b4IMP6l//zjuwcKHa86W4mgx9M2sL3AacBQwDLjazYQ3uth4YDzzQyEP8F3BZfmWKtF7Zztzly+tfP2uWj+xRe74UUy5n+iOANSGEV0MIO4EHgXPr3iGE8FoIYQmwp+EvhxBmAR9FUaxIa7S3XbTSaV9R87jjil+TJFcuod8b2FDn542Z60QkB/36+aYqdUM/u/TCGWd4R65IseQS+o1t3BaiLMLMJpjZAjNbsHnz5igfWiR2bdp4u37d0F+xAt54Q+35Uny5hP5GoE+dnw8B3oyyiBDClBDC8BDC8IqKiigfWqQkNAz9dNovFfpSbLmE/nxgkJkNMLMOwEXA44UtS6S8pFKwaRNkP8im0zBkCPTtG29dkjxNhn4IoQaYBFQDK4CHQwgvm9mPzGwsgJkdb2YbgXHAHWb2/yedm9kc4BFgtJltNDONVZDEqbscw44dMHu2zvIlHu1yuVMI4SngqQbX/aDO9/PxZp/GfvfkfAoUKQd1R/Ds3g2ffKLQl3jkFPoikp+DDvLhmcuW+USt9u3hlFPirkqSSKEvUgRmtZ25W7fCyJHQtWvcVUkSae0dkSJJpWDRInjpJc3Clfgo9EWKJJXytnxQe77ER6EvUiTZztyePeHoo+OtRZJLoS9SJJWVfjlmjM/SFYmDOnJFiqRnT/jP/4Szzoq7Ekkyhb5IEf3Lv8RdgSSdPmSKiCSIQl9EJEEU+iIiCaLQFxFJEIW+iEiCKPRFRBJEoS8ikiAKfRGRBLEQIt3jPG9mthl4Pe468tQTeCfuIkqIno/69HzU0nNRXz7PR78QQpObjJdc6JcDM1sQQhgedx2lQs9HfXo+aum5qK8Yz4ead0REEkShLyKSIAr9wpgSdwElRs9HfXo+aum5qK/gz4fa9EVEEkRn+iIiCaLQz5OZ9TGzZ81shZm9bGY3ZK7vYWYzzGx15nL/uGstFjNra2Yvmtn0zM8DzGxe5rl4yMw6xF1jsZjZfmb2ezN7JfMaOTHhr41/yvw/WWZm08ysU5JeH2Z2l5ltMrNlda5r9PVg7n/MbI2ZLTGzY6OoQaGfvxrgphDCUOAEYKKZDQNuBmaFEAYBszI/J8UNwIo6P/8U+EXmuXgfuDqWquLxS+DpEMIQ4Cj8eUnka8PMegPXA8NDCCmgLXARyXp9/BY4s8F1e3s9nAUMynxNAG6PpIIQgr4i/AIeA8YAK4GDMtcdBKyMu7Yi/f2HZF64pwPTAcMnm7TL3H4iUB13nUV6LroD68j0ndW5Pqmvjd7ABqAHvmvfdOAfkvb6APoDy5p6PQB3ABc3dr98vnSmHyEz6w8cA8wDeoUQ3gLIXB4YX2VFdQvwf4A9mZ8PALaEEGoyP2/E//MnwUBgM3B3prmrysy6kNDXRgjhDeBnwHrgLeADYCHJfX1k7e31kH2TzIrkuVHoR8TMugJ/AG4MIXwYdz1xMLOzgU0hhIV1r27krkkZMtYOOBa4PYRwDLCVhDTlNCbTVn0uMAA4GOiCN2E0lJTXR1MK8n9HoR8BM2uPB/79IYQ/Zq5+28wOytx+ELAprvqKaCQw1sxeAx7Em3huAfYzs3aZ+xwCvBlPeUW3EdgYQpiX+fn3+JtAEl8bAGcA60IIm0MIu4A/Al8gua+PrL29HjYCfercL5LnRqGfJzMz4E5gRQjhv+vc9DhwReb7K/C2/rIWQviXEMIhIYT+eAfdMyGErwPPAudn7paI5wIghPB3YIOZDc5cNRpYTgJfGxnrgRPMrHPm/032+Ujk66OOvb0eHgcuz4ziOQH4INsMlA9NzsqTmZ0EzAGWUtuO/V28Xf9hoC/+Yh8XQngvliJjYGanAt8OIZxtZgPxM/8ewIvApSGEHXHWVyxmdjRQBXQAXgWuxE+2EvnaMLMfAhfio95eBK7B26kT8fows2nAqfhqmm8D/xf4E428HjJvjLfio322AVeGEBbkXYNCX0QkOdS8IyKSIAp9EZEEUeiLiCSIQl9EJEEU+iIiCaLQFxFJEIW+iEiCKPRFRBLk/wEmxxVtxV1jewAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制不同PCA维数下模型的性能，找到最佳模型／参数（分数最高）\n",
    "plt.plot(Ks, np.array(CH_scores), 'b-')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.15334441883250638, 0.13384652390546367, 0.15464525128955411, 0.15342015056744307, 0.15508252253741733, 0.10738250944103446, 0.1371370511392862, 0.14273793844828214, 0.12442431763566943, 0.12573799481633943]\n"
     ]
    }
   ],
   "source": [
    "print (CH_scores)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最佳值为K=50, CH_score=0.15508252253741733；由于CH_score<0.2，表明数据不存在聚类特征，即聚类效果不好。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
